Skip to content

Conversation

@aheejin
Copy link
Member

@aheejin aheejin commented Nov 23, 2024

This updates libcxx and libcxxabi to LLVM 19.1.4:
https://github.com/llvm/llvm-project/releases/tag/llvmorg-19.1.4

The initial update was done using update_libcxx.py and update_libcxxabi.py, and subsequent fixes were made in indidual commits. The commit history here is kind of messy because of CI testing so not all individual commits are noteworthy.

Additional changes:


Other things to note:

@aheejin aheejin force-pushed the update_libcxx_libcxxabi_19 branch from 8936564 to f619917 Compare November 23, 2024 07:19
This is our emscripten-specific configuration file and was mistakenly
deleted when I ran update_libcxx.py.
This file was added as a part of LLVM 18 update (emscripten-core#21638) in
emscripten-core@8d51927
and mistakenly deleted when I ran update_libcxx.py. This file was copied
from
https://github.com/llvm/llvm-project/blob/aadaa00de76ed0c4987b97450dd638f63a385bed/libcxx/vendor/llvm/default_assertion_handler.in, so this also updates the file with the newest `default_assertion_handler.in`.
These Emscripten-specific files were mistakenly deleted when I ran
update_libcxx.py.
https://github.com/llvm/llvm-project/blob/aadaa00de76ed0c4987b97450dd638f63a385bed/libcxx/src/expected.cpp
was added in llvm/llvm-project#87390 and this
file assumes C++23 to be compiled. Apparently libc++ sources are always
built with C++23 so they don't guard things against it in `src/`:
llvm/llvm-project#87390 (comment)

This also bumps libc++abi to C++23 because... why not
We disabled C++20 time zone support in LLVM 18 update (emscripten-core#21638):
emscripten-core@df9af64

The list of source files related to time zone support has changed in
llvm/llvm-project#74928, so this commit reflects
it.
Only `char`, `wchar`, `char8`, `char16`, and `char32` are valid
specialization for `std::basic_string`:
https://en.cppreference.com/w/cpp/string/basic_string

But libc++ had a base template for `basic_string` that allows any type
to be passed for a long time. It looks there have been several attempts
to remove this but they restored it afterwards due to some complaints,
in chronological order:
llvm/llvm-project@aeecef0
llvm/llvm-project@08a0faf
llvm/llvm-project@e30a148
llvm/llvm-project#66153
llvm/llvm-project#72694

The last one, llvm/llvm-project#72694,
eventually removed it. So `std::basic_string<unsigned_char>` is not
allowed anymore.
@aheejin aheejin force-pushed the update_libcxx_libcxxabi_19 branch from 812cb80 to 643050f Compare December 4, 2024 02:48
Not sure why but some of them decreasd by ~3%. Increases don't seem to
be meaningful; they are usually ~0.3%.
Only `char`, `wchar`, `char8`, `char16`, and `char32` are valid
specialization for `std::basic_string`:
https://en.cppreference.com/w/cpp/string/basic_string

But libc++ had a base template for `basic_string` that allowed any type
to be passed for a long time. It looks there have been several attempts
to remove this after which they restored it due to complaints, in
chronological order:
llvm/llvm-project@aeecef0
llvm/llvm-project@08a0faf
llvm/llvm-project@e30a148
llvm/llvm-project#66153
llvm/llvm-project#72694

The last one, llvm/llvm-project#72694,
eventually removed it. So `std::basic_string<unsigned_char>` is not
allowed anymore. This removes all uses of
`std::basic_string<unsigned_char>` from embind.

This needs to be done to update libc++ to LLVM 19 (emscripten-core#22994). I'm
uploading this as a separate PR because this removes a functionality
from embind.
Only `char`, `wchar`, `char8`, `char16`, and `char32` are valid
specialization for `std::basic_string`:
https://en.cppreference.com/w/cpp/string/basic_string

But libc++ had a base template for `basic_string` that allowed any type
to be passed for a long time. It looks there have been several attempts
to remove this after which they restored it due to complaints, in
chronological order:
llvm/llvm-project@aeecef0
llvm/llvm-project@08a0faf
llvm/llvm-project@e30a148
llvm/llvm-project#66153
llvm/llvm-project#72694

The last one, llvm/llvm-project#72694,
eventually removed it. So `std::basic_string<unsigned_char>` is not
allowed anymore. This removes all uses of
`std::basic_string<unsigned_char>` from embind.

This needs to be done to update libc++ to LLVM 19 (emscripten-core#22994). I'm
uploading this as a separate PR because this removes a functionality
from embind.
hedwigz pushed a commit to hedwigz/emscripten that referenced this pull request Dec 18, 2024
Only `char`, `wchar`, `char8`, `char16`, and `char32` are valid
specializations for `std::basic_string`:
https://en.cppreference.com/w/cpp/string/basic_string

But libc++ had a base template for `basic_string` that allowed any type
to be passed for a long time. It looks there have been several attempts
to remove this after which they restored it due to complaints, in
chronological order:

llvm/llvm-project@aeecef0
llvm/llvm-project@08a0faf
llvm/llvm-project@e30a148
llvm/llvm-project#66153
llvm/llvm-project#72694

The last one, llvm/llvm-project#72694,
eventually removed it. So `std::basic_string<unsigned char>` is not
allowed anymore. This removes all uses of `std::basic_string<unsigned
char>` from embind.

This needs to be done to update libc++ to LLVM 19 (emscripten-core#22994). I'm
uploading this as a separate PR because this removes a functionality
from embind.
aheejin added a commit to aheejin/emscripten that referenced this pull request Dec 19, 2024
When `deterministic_paths` is set, we are currently using
`-ffile-prefix-map` to produce the same path in data and debug info. In
the case of absolute paths, their emscripten path is replaced with a
fake path `/emsdk/emscripten`, and in the case of relative paths, all
path relative to the emscripten directory is removed, so
`../../system/lib/somefile.c` becomes `system/lib/somefiles.c`.
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L472-L477
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L495-L501

So this does not make relative paths and absolute paths the same, which
can lead to different builds depending on whether the command line uses
absolute paths vs. relative ones. Currently we use relative paths when
`EMCC_BATCH_BUILD` is set. And Ninja builds cannot use relative paths.

This is also what was suggested in
emscripten-core#23195 (comment)
while discussins `__FILE__` problem in emscripten-core#23915.

This does not change any code size tests because no code size tests
happens to contain `__FILE__` at the moment. (One will be added in
 emscripten-core#22994)
aheejin added a commit that referenced this pull request Dec 19, 2024
#23222)

When `deterministic_paths` is set, we are currently using
`-ffile-prefix-map` to produce the same path in data and debug info. In
the case of absolute paths, their emscripten path is replaced with a
fake path `/emsdk/emscripten`, and in the case of relative paths, all
path relative to the emscripten directory is removed, so
`../../system/lib/somefile.c` becomes `system/lib/somefiles.c`.
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L472-L477
https://github.com/emscripten-core/emscripten/blob/f66b5d706e174d9e5cc6122c06ea29dcd2735cd0/tools/system_libs.py#L495-L501

So this does not make relative paths and absolute paths the same, which
can lead to different builds depending on whether the command line uses
absolute paths vs. relative ones. Currently we use relative paths when
`EMCC_BATCH_BUILD` is set. And Ninja builds cannot use relative paths.

This is also what was suggested in
#23195 (comment)
while discussins `__FILE__` problem in #23195.

This does not change any code size tests because no code size tests
happens to contain `__FILE__` at the moment. (One will be added in
#22994)
@aheejin aheejin self-assigned this Dec 21, 2024
This is an automatic change generated by tools/maint/rebaseline_tests.py.

The following (11) test expectation files were updated by
running the tests with `--rebaseline`:

```
code_size/embind_hello_wasm.json: 18011 => 17910 [-101 bytes / -0.56%]
code_size/embind_val_wasm.json: 17198 => 16938 [-260 bytes / -1.51%]
other/codesize/test_codesize_cxx_ctors1.size: 128894 => 129570 [+676 bytes / +0.52%]
other/codesize/test_codesize_cxx_ctors2.size: 128343 => 129019 [+676 bytes / +0.53%]
other/codesize/test_codesize_cxx_except.size: 170950 => 171248 [+298 bytes / +0.17%]
other/codesize/test_codesize_cxx_except_wasm.size: 142154 => 142510 [+356 bytes / +0.25%]
other/codesize/test_codesize_cxx_except_wasm_exnref.size: 144741 => 145056 [+315 bytes / +0.22%]
other/codesize/test_codesize_cxx_lto.size: 121844 => 122262 [+418 bytes / +0.34%]
other/codesize/test_codesize_cxx_mangle.size: 232468 => 233088 [+620 bytes / +0.27%]
other/codesize/test_codesize_cxx_noexcept.size: 131701 => 132146 [+445 bytes / +0.34%]
other/codesize/test_codesize_cxx_wasmfs.size: 168848 => 169407 [+559 bytes / +0.33%]

Average change: +0.08% (-1.51% - +0.53%)
```
@aheejin aheejin marked this pull request as ready for review January 3, 2025 00:05
@aheejin aheejin requested a review from sbc100 January 3, 2025 00:05
Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC from reading the comments non of the change in the system directory contains any new modifications (they all come from upstream and/or preexisting changes)?

# headers on emsdk-bundled clang
test_targets: "other skip:other.test_native_link_error_message"
test_targets: "
other.test_minimal_runtime_code_size_hello_embind"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you can revert this file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I thought I reverted all CI testing changes but this remained... Reverted.

@aheejin
Copy link
Member Author

aheejin commented Jan 3, 2025

IIUC from reading the comments non of the change in the system directory contains any new modifications (they all come from upstream and/or preexisting changes)?

I had omitted a few changes before, but re-added all additional changes to the PR description now.

@aheejin aheejin merged commit 5e3ed77 into emscripten-core:main Jan 3, 2025
29 checks passed
@aheejin aheejin deleted the update_libcxx_libcxxabi_19 branch January 3, 2025 02:08
aheejin added a commit to aheejin/emscripten that referenced this pull request May 16, 2025
Our prevous `__assertion_handler` was copied from
`libcxx/vendor/llvm/default_assertion_handler.in`. This updates out
`__assertion_handler` to the new
`libcxx/vendor/llvm/default_assertion_handler.in`. For what this file
is, see the PR description of emscripten-core#22994, with which the file was added.
aheejin added a commit to aheejin/emscripten that referenced this pull request May 16, 2025
Our prevous `__assertion_handler` was copied from
`libcxx/vendor/llvm/default_assertion_handler.in`. This updates out
`__assertion_handler` to the new
`libcxx/vendor/llvm/default_assertion_handler.in`. For what this file
is, see the PR description of emscripten-core#22994, with which the file was added.
aheejin added a commit that referenced this pull request May 21, 2025
This updates libcxx and libcxxabi to LLVM 20.1.4:
https://github.com/llvm/llvm-project/releases/tag/llvmorg-20.1.4

Before going into each additional change of the PR, these are some
noteworthy changes from this LLVM release:

- Freezing C++03 headers
- RFC:
https://discourse.llvm.org/t/rfc-freezing-c-03-headers-in-libc/77319
  - PRs:
    - llvm/llvm-project#108999
    - llvm/llvm-project#109000
    - llvm/llvm-project#109001
    - llvm/llvm-project#109002
    - 
This copies libc++ headers of the last LLVM 19 release into a separate
directory
(https://github.com/llvm/llvm-project/tree/main/libcxx/include/__cxx03)
and redirects all C++03 workflow there. The motivation is not to fix
C++03 related changes unless they are critical bugs, and simplifies the
main headers. So the main headers are like
  ```
  #if __cplusplus < 201103L && defined(_LIBCPP_USE_FROZEN_CXX03_HEADERS)
  #  include <__cxx03/algorithm>
  #else
  ... main header content ...
  ```

As you can see it looks like we can avoid opting into this by not
defining `_LIBCPP_USE_FROZEN_CXX03_HEADERS` at the moment. I think their
eventual goal is to remove C++03 support from the main headers but it
hasn't seem to have happened yet and I don't know what their timeline
for that is.

Adding that `__cxx03` directory increases the header size by 10MB. We
can get away not using them by not defining
`_LIBCPP_USE_FROZEN_CXX03_HEADERS` in this release, so this update does
not include that directory.

- Sharing of headers between libc++ and libc (Project Hand in Hand)
- RFC:
https://discourse.llvm.org/t/rfc-project-hand-in-hand-llvm-libc-libc-code-sharing/77701
- PR: llvm/llvm-project#91651 (More are likely
to come)

This tries to share some libc headers from libcxx. libcxx's source files
can depend on headers from `libc/shared`, `libc/src/__support`,
`libc/include/llvm-libc-macros`, and `libc/include/llvm-libc-types`. And
it turns out libcxx can also depend on `libc/hdr`, even though the
directory is not in the diagram in the RFC.

These headers can be only included from `libcxx/src` and not
`libcxx/include`, to prevent libcxx's API from changing. So these
headers don't need to be copied into `cache/`.

- Locale API reimplementation
  It doesn't seem to have an RFC, but the PRs are:
  - llvm/llvm-project#113737
  - llvm/llvm-project#115176
  - llvm/llvm-project#115752
  - llvm/llvm-project#122489

It looks this aims provide a new way to define locale API for each
platform. Currently Apple, FreeBSD, MSVCRT, and Fuschia are using the
new API and the others are still using the old one:
https://github.com/llvm/llvm-project/blob/ec28b8f9cc7f2ac187d8a617a6d08d5e56f9120e/libcxx/include/__locale_dir/locale_base_api.h#L116-L138
For our purpose, adding `if defined(__EMSCRIPTEN__)` entry to the list
of "old" list seems to work for the moment, but we may need to move to
the new way eventually later.

---

Additional changes:

- Copy `vendor/llvm/default_assertion_handler.in` to
`__assertion_handler`:
aa53648
Our previous `__assertion_handler` was copied from
`libcxx/vendor/llvm/default_assertion_handler.in`. This updates our
`__assertion_handler` to the new
`libcxx/vendor/llvm/default_assertion_handler.in`. For what this file
is, see the PR description of #22994, with which the file was added.

- Remove `libcxx/include/__cxx03` directory:
d646f6b
As I described in "Freezing C++ headers" above, C++03 headers were
copied to `include/cxx03` to be "frozen". But by not defining
`_LIBCPP_USE_FROZEN_CXX03_HEADERS` we can still use the main headers.
This new `__cxx03` header directory is almost 10M and we are not using
it in this update, this deletes it.

- Define more variables in `__config_site`:
9912236
libcxx decides to almost all configuration macros to be defined. So
before it tested whether a macro was defined/undefined, and now it
assumes it is defined and tests whether its value is 1/0.

Before llvm/llvm-project#112094
`_config_site.in` used to use `#cmakedefine`, which only defined
variables when they were enabled, but now it uses `#cmakedefine01`,
which always defines variables and assigns 1 or 0 depending on whether
the feature is enabled.

So this change adds `#define` to each variable that `_config_site.in`
has an entry of. The value assigned is 1 if it was defined in our
previous Emscripten environment and 0 if not.

- Fix Emscripten's locale:
2ae01b0
Before, the code was like
https://github.com/emscripten-core/emscripten/blob/dc1abd514b1bade135a01a4453a9ff6def0793b6/system/lib/libcxx/include/__locale_dir/locale_base_api.h#L12-L30
But now they are divided into two parts, one using the new API and the
other using old. See "Locale API reimplementation" above.

https://github.com/llvm/llvm-project/blob/ec28b8f9cc7f2ac187d8a617a6d08d5e56f9120e/libcxx/include/__locale_dir/locale_base_api.h#L116-L138
This adds Emscripten in the beginning of the "old" API list. (This has
to be the beginning; see #23414)

- Import libc headers used by libcxx:
12a4ee4,
12a4ee4
and
43c8ce4
This imports a part of libc headers into `system/lib/llvm-libc`. The
imported directories are:
  - `libc/shared`
  - `libc/src/__support`
  - `libc/include/llvm-libc-macros`
  - `libc/include/llvm-libc-types`
  - `libc/hdr`
  See "sharing of headers between libc+ and libc" above for details.
This also applies llvm/llvm-project#133999,
which is a bugfix that has not be backported, which fixes the bug of
including from a wrong directory.

- `std::uncaught_exception` -> `std::uncaught_exceptions`:
1bf4e78
`std::uncaught_exception` has been deprecated in C++17, so it generates
a warning (which we treat as an error). Replaced it with
`std::uncaught_exceptions`, which returns the number of uncaught
exceptions (but can still be used as a boolean in the test).

- Remove `ryu_constants.h` and `ryu_long_double_constants.h` from libc:
5767ac4
These are not used from libcxx and `ryu_long_double_constants.h` is huge
(12M).
Lukasdoe pushed a commit to Lukasdoe/emscripten that referenced this pull request Jun 19, 2025
This updates libcxx and libcxxabi to LLVM 20.1.4:
https://github.com/llvm/llvm-project/releases/tag/llvmorg-20.1.4

Before going into each additional change of the PR, these are some
noteworthy changes from this LLVM release:

- Freezing C++03 headers
- RFC:
https://discourse.llvm.org/t/rfc-freezing-c-03-headers-in-libc/77319
  - PRs:
    - llvm/llvm-project#108999
    - llvm/llvm-project#109000
    - llvm/llvm-project#109001
    - llvm/llvm-project#109002
    - 
This copies libc++ headers of the last LLVM 19 release into a separate
directory
(https://github.com/llvm/llvm-project/tree/main/libcxx/include/__cxx03)
and redirects all C++03 workflow there. The motivation is not to fix
C++03 related changes unless they are critical bugs, and simplifies the
main headers. So the main headers are like
  ```
  #if __cplusplus < 201103L && defined(_LIBCPP_USE_FROZEN_CXX03_HEADERS)
  #  include <__cxx03/algorithm>
  #else
  ... main header content ...
  ```

As you can see it looks like we can avoid opting into this by not
defining `_LIBCPP_USE_FROZEN_CXX03_HEADERS` at the moment. I think their
eventual goal is to remove C++03 support from the main headers but it
hasn't seem to have happened yet and I don't know what their timeline
for that is.

Adding that `__cxx03` directory increases the header size by 10MB. We
can get away not using them by not defining
`_LIBCPP_USE_FROZEN_CXX03_HEADERS` in this release, so this update does
not include that directory.

- Sharing of headers between libc++ and libc (Project Hand in Hand)
- RFC:
https://discourse.llvm.org/t/rfc-project-hand-in-hand-llvm-libc-libc-code-sharing/77701
- PR: llvm/llvm-project#91651 (More are likely
to come)

This tries to share some libc headers from libcxx. libcxx's source files
can depend on headers from `libc/shared`, `libc/src/__support`,
`libc/include/llvm-libc-macros`, and `libc/include/llvm-libc-types`. And
it turns out libcxx can also depend on `libc/hdr`, even though the
directory is not in the diagram in the RFC.

These headers can be only included from `libcxx/src` and not
`libcxx/include`, to prevent libcxx's API from changing. So these
headers don't need to be copied into `cache/`.

- Locale API reimplementation
  It doesn't seem to have an RFC, but the PRs are:
  - llvm/llvm-project#113737
  - llvm/llvm-project#115176
  - llvm/llvm-project#115752
  - llvm/llvm-project#122489

It looks this aims provide a new way to define locale API for each
platform. Currently Apple, FreeBSD, MSVCRT, and Fuschia are using the
new API and the others are still using the old one:
https://github.com/llvm/llvm-project/blob/ec28b8f9cc7f2ac187d8a617a6d08d5e56f9120e/libcxx/include/__locale_dir/locale_base_api.h#L116-L138
For our purpose, adding `if defined(__EMSCRIPTEN__)` entry to the list
of "old" list seems to work for the moment, but we may need to move to
the new way eventually later.

---

Additional changes:

- Copy `vendor/llvm/default_assertion_handler.in` to
`__assertion_handler`:
emscripten-core@aa53648
Our previous `__assertion_handler` was copied from
`libcxx/vendor/llvm/default_assertion_handler.in`. This updates our
`__assertion_handler` to the new
`libcxx/vendor/llvm/default_assertion_handler.in`. For what this file
is, see the PR description of emscripten-core#22994, with which the file was added.

- Remove `libcxx/include/__cxx03` directory:
emscripten-core@d646f6b
As I described in "Freezing C++ headers" above, C++03 headers were
copied to `include/cxx03` to be "frozen". But by not defining
`_LIBCPP_USE_FROZEN_CXX03_HEADERS` we can still use the main headers.
This new `__cxx03` header directory is almost 10M and we are not using
it in this update, this deletes it.

- Define more variables in `__config_site`:
emscripten-core@9912236
libcxx decides to almost all configuration macros to be defined. So
before it tested whether a macro was defined/undefined, and now it
assumes it is defined and tests whether its value is 1/0.

Before llvm/llvm-project#112094
`_config_site.in` used to use `#cmakedefine`, which only defined
variables when they were enabled, but now it uses `#cmakedefine01`,
which always defines variables and assigns 1 or 0 depending on whether
the feature is enabled.

So this change adds `#define` to each variable that `_config_site.in`
has an entry of. The value assigned is 1 if it was defined in our
previous Emscripten environment and 0 if not.

- Fix Emscripten's locale:
emscripten-core@2ae01b0
Before, the code was like
https://github.com/emscripten-core/emscripten/blob/dc1abd514b1bade135a01a4453a9ff6def0793b6/system/lib/libcxx/include/__locale_dir/locale_base_api.h#L12-L30
But now they are divided into two parts, one using the new API and the
other using old. See "Locale API reimplementation" above.

https://github.com/llvm/llvm-project/blob/ec28b8f9cc7f2ac187d8a617a6d08d5e56f9120e/libcxx/include/__locale_dir/locale_base_api.h#L116-L138
This adds Emscripten in the beginning of the "old" API list. (This has
to be the beginning; see emscripten-core#23414)

- Import libc headers used by libcxx:
emscripten-core@12a4ee4,
emscripten-core@12a4ee4
and
emscripten-core@43c8ce4
This imports a part of libc headers into `system/lib/llvm-libc`. The
imported directories are:
  - `libc/shared`
  - `libc/src/__support`
  - `libc/include/llvm-libc-macros`
  - `libc/include/llvm-libc-types`
  - `libc/hdr`
  See "sharing of headers between libc+ and libc" above for details.
This also applies llvm/llvm-project#133999,
which is a bugfix that has not be backported, which fixes the bug of
including from a wrong directory.

- `std::uncaught_exception` -> `std::uncaught_exceptions`:
emscripten-core@1bf4e78
`std::uncaught_exception` has been deprecated in C++17, so it generates
a warning (which we treat as an error). Replaced it with
`std::uncaught_exceptions`, which returns the number of uncaught
exceptions (but can still be used as a boolean in the test).

- Remove `ryu_constants.h` and `ryu_long_double_constants.h` from libc:
emscripten-core@5767ac4
These are not used from libcxx and `ryu_long_double_constants.h` is huge
(12M).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants